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Introduction 

The use of satellite imaging to remotely detect areas of high risk for transmission of 
infectious disease is an appealing prospect for large-scale monitoring of these diseases. 
The detection of large-scale environmental determinants of disease risk, often called 
landscape epidemiology, has been motivated by several authors (Pavlovsky 1966; Meade 
et al. 1988). The basic notion is that large-scale factors such as population density, air 
temperature, hydrological conditions, soil type, and vegetation can determine in a coarse 
fashion the local conditions contributing to disease vector abundance and human contact 
with disease agents. These large-scale factors can often be remotely detected by sensors 
or cameras mounted on satellite or aircraft platforms and can thus be used in a predictive 
model to mark high risk areas of transmission and to target control or monitoring efforts. 
A review of satellite technologies for this purpose was recently presented by Washino 
and Wood (1994) and Hay (1997) and Hay et al. (1997). 



In China, there is currently concern about the establishment and spread of infectious 
diseases, including malaria and schistosomiasis, in the area along the Yangtze upstream 
of the Three Gorges Dam which is now under construction. Our group has been working 
with parasitologists from the Sichuan Institute of Parasitic Disease (SIPD) responsible for 
schistosomiasis monitoring and control in the area of the dam. The profound ecological 
and social changes that will take place as the dam is being constructed and when it is 
completed may create new habitat for the snail species central to the cycling of the 
disease, as well as new relationships between humans, domestic animals and the aquatic 
environment. The size of the lake that will be created behind the dam and the difficulty 
of access to this mountainous area make remote sensing technology an attractive adjunct 
to land based surveillance of these changes as the lake fills and the dam goes into 
operation. 

As a means of exploring the use of remote sensing in the context of schistosomiasis 
control prior to the completion of the Three Gorges Dam, we have been studying a 
region where the disease is endemic, where ground based data sets on its prevalence and 
on snail habitat exist, and which is of a scale suitable for study using remote sensing. 

With the assistance of our colleagues in the SIPD, we have focused on the area along the 
Arming River in the Daliang mountainous area of southwestern Sichuan province. This 
region includes villages studied in our earlier work. 

Remote sensing has been demonstrated to be a viable means of identifying habitat for 
vectors of other diseases. The potential efficacy for using remote sensing to determine 
high-risk areas of malaria transmission was recently illustrated (Beck et al. 1994; 1997). 
Two types of Anopheline mosquito habitat, unmanaged pastures and transitional swamps, 
were shown to be detectable based on classification of Landsat Thematic Mapper (TM) 
data. That research was an extension of previous work which focused on the 
identification of high and low Anopheline-producing rice fields (Wood et al. 1991). 
Landsat TM data have also been used to map land cover to study landscape correlates of 
Lyme disease (Dister et al. 1993). In that study disease data and landscape classifications 
were overlaid to look for land cover correlates to disease risk. 

Several studies have implied that remote sensing could be a useful tool for 
schistosomiasis monitoring. Cross and Bailey (1984) and Cross et al. (1984) showed a 
correlation between local temperature variation and prevalence rate. Malone et al.(1994) 
showed that historical prevalence data correlated well with remotely detectable 
geographic features. Both of these studies take a different approach from the Anopheline 
studies in that they demonstrated a correlation between disease and ecological factors, 
whereas the malaria vector studies by Beck et al. (1994; 1997) and Wood et al. (1991), 
remotely sense habitat correlates of the vector known to be the disease agent. 

In the current study we ask if the second approach is applicable to detecting spatial 
variations in the vector population which transmits the parasite causing schistosomiasis 
japonicum, the Asian form of schistosomiasis. The disease is vectored by an amphibious 
snail, Oncomelania hupensis. A recent preliminary study by the SIPD used Advanced 



Very High Resolution Radiometer (AVHRR) data to identify snail habitat (Li et al. 

1990). In the current analysis we use higher resolution Landsat TM data to look for 
correlations with detailed ground based snail ecology surveys. If surveyed snail habitats 
correlate with the satellite data, there is the potential to use remote sensing to monitor 
large and remote areas in the region of the dam, and to identify areas at high risk of 
transmission. 

The current problem is different from that of detecting malaria vectors since the vector 
habitat for O. hupensis is usually a micro-environment which is itself not detectable using 
most remote sensing data because of their course spatial resolution. However, micro- 
environmental conditions may be affected by larger scale factors including local 
vegetation type and surrounding crops, fertilizer usage, and water and temperature 
patterns. These factors will cause local changes in the environment, which in turn will 
influence the remote sensing signal. Further, the other two schistosomiasis studies found 
correlations between large-scale phenomena and disease rates implying that something 
can be seen at this scale. The question addressed at present is whether or not remote 
sensing data of local areas can be accurately classified, based on large-scale 
environmental factors, as being suitable for these vector snails or not, and thus be at high 
risk for transmission. 

Methods 

To address this issue our group conducted a study in the Aiming River Valley in 
southwestern Sichuan Province. The Arming River Valley is a high mountain valley at an 
elevation of about 1 500 meters. This is primarily an agricultural area with irrigated 
farming of rice, corn, wheat, and a variety of vegetables and some export crops. The 
valley is also a highly endemic area for schistosomiasis japonica. The remote sensing 
data used was from the Landsat TM sensor. The ground data indicating suitable snail 
habitat were point observations from one environment type and classified as habitat or 
non-habitat. Suitability was determined by the presence of young or reproducing snails 
vs. no young or reproducing snails. Few locations are found with only adult snails 
present, presumably because snails leave unsuitable locations or die. 

A large-scale snail monitoring effort was conducted in 1994 by the Xichang County Anti- 
endemic Station (XCAS). The station is responsible for monitoring and controlling 
human schistosomiasis infection and vector snail ecology in the seventeen-township 
middle section of the Aiming River Valley. Snail surveys were performed throughout the 
area in townships where the human incidence exceeded 10%. Snail surveillance was 
done in June. We chose this section of the river valley as our study area in order to take 
advantage of this existing surveillance data. The study area extends from Lizhou 
township in the north to Hexi township in the south, and covers about 45 km of the river 
valley around Xichang City. A map of the area showing these reference points is shown 
in Figure 1 . 



* Lizhou 



Figure 1: Map of the Anning River Valley Study Area. Non-habitat sites are 
shown as gray pixels. Snail habitat sites are shown in black pixels. 



Two Landsat TM scenes (one Spring April 7, 1994, and one Fall October 16, 1994) were 
obtained for the region. Both images were free of cloud cover over the area of interest, 
and each represents a distinct agricultural season. The major crops during these times are 
rice and com in summer-fall and wheat and beans in the winter-spring season. 

Ground data on the locations of snail colonies was obtained from the XCAS’s 1994 snail 
surveys (this is being supplemented with density information). During 10 days in the 
middle of June, 1997, our group with the help of the local authorities and the head of the 
XCAS visited townships and recorded the geographic locations of the 1994 surveillance 
data. Collection sites were located with a Trimble Pro XL global positioning system to 
allow for correlation with the remote sensing data. Three base stations were established 
and positioned with respect to a known surveyed control point at the peak of the Lushan 
mountain south east of Xichang city. All data points were differentially corrected to the 
base station locations to provide positioning accuracy in the 1-5 m range. 

Collection sites were located in 14 townships throughout the study area. Townships were 
chosen based on availability of 1994 data or if there was historical knowledge of 
apparently stable snail habitat or non-habitat. Three environment types exist in the study 
area: irrigated farming in the river plain, terraced rice culture at the base of the hills, and 
mountain streams areas higher in the mountains. The three habitat types are structurally 
different with distinct local ecologies. In light of this, the study was limited to one type 
of environment, irrigated farming areas in the river plain, for which there was an 
abundance of ground/field data (and travel was more convenient). Snail habitat in the 
river plain area is limited to irrigation and drainage ditches and the boundaries of fields. 
This resulted in a total of 103 data points (55 classified as habitat and 48 as non-habitat). 

Image processing was performed using PCIWORKS image processing software. Before 
data analysis, the images were geometrically corrected and registered using 1 1 ground 
control points taken throughout the river valley. Points used for referencing the image to 
a world coordinate system were large structures easily seen on the image, such as the 
comers of the Xichang airport runway, large intersections and an isolated paved village 
compound. The 103 ground/field data points were located on the image. Each snail 
habitat and non-habitat site was specified as a 3 x 3 pixel area surrounding the site 
location as determined in the field by GPS measurements. 

After geographic correction, a preliminary supervised maximum likelihood classification 
was performed using all TM channels from both dates. The 55 habitat and 48 non-habitat 
areas were used both to train the classification algorithm and assess the accuracy of the 
classification. The results of this accuracy assessment are presented in the next section. 

Realizing that the accuracy of our preliminary classification was inadequate, we next 
employed a two-tiered analysis approach. The first step of this approach employed an 
unsupervised classification method called Isodata clustering to break up snail habitat and 
non-habitat classes into subclasses. The Isodata algorithm is an iterative process whereby 
the pixels of the image are grouped into clusters based on an examination of their 



multispectral brightness values. Pixels grouped into the same cluster are similar with 
respect to their spectral properties. The Isodata algorithm was first applied to those pixels 
corresponding to snail habitat sites. The algorithm was used to split the pixels into 5 
separate clusters. These 5 snail habitat clusters may correspond to different micro- 
habitats which are all suitable for snails. The Isodata algorithm was then run using the 
non-habitat sites to produce 5 non-habitat clusters. The spectral distributions for each of 
these 10 clusters were determined and used to perform the second part (i.e., supervised 
maximum likelihood classification) of this two-tiered analysis. 

Results 

The result of the preliminary supervised classification using all TM bands from the spring 
and fall images is presented in Table 1. For the 55 snail habitat sites, there was good 
classification accuracy, with 89.3% of the pixels being classified correctly. However, for 
the non-habitat sites there was a large number of misclassified pixels, with only 52.3% of 
the pixels being accurately classified as non-habitat. 3.4% of the pixels corresponding to 
snail habitat sites and 8.8% of the pixels corresponding to non-habitat sites were 
unclassified. 

The result of the two-tiered classification is presented in Table 2. For the pixels 
corresponding to 55 snail habitat sites, 3.6% were unclassified. Of the remaining 96.4%, 
90.3% of the pixels were correctly classified as snail habitat. For the pixels 
corresponding to 48 non-habitat sites, 4.2% were unclassified. Of the remaining 95.8%, 
86.6% of the pixels were correctly classified as non-habitat. A classification matrix 
showing the percentages of each cluster for both types of habitat is presented in Table 3. 

The resulting classification for the Anning Valley is shown in Figure 2. A 5 x 5 pixel 
mode filter was applied to the image for presentation. The mode filter is primarily used 
to clean up thematic maps for presentation purposes by it grouping together areas that are 
predominantly snail habitat or non-habitat. In particular, for each 5x5 pixel area, the 
predominant class is assigned to all pixels in the area. 




Figure 2: Three panels showing (from left) (a) Landsat TM of Arming river 
valley, (b) classification of habitat using Isodata and maximum 
likelihood algorithms, and (c) enlargement of valley floor showing 
mixed habitat. 




Table 1 : Results of Preliminary Maximum Likelihood Classification of 

Snail habitat and Non-habitat Sites. 



Total 

# 

Pixels 

% 

Unci. 

Pixels 

% 

Classified as 
Snail habitat 

% 

Classified as 
Non-habitat 

48 Non- 
habitat Sites 

432 

8.8 

38.9 

52.3 

55 Snail 
habitat Sites 

495 

3.4 

89.3 

7.3 


Table 2: Results of Two-tiered analysis using Isodata and Maximum 

Likelihood Classification algorithms. 



Total 

# 

Pixels 

% 

Unci. 

Pixels 

% 

Classified 
within Snail 
habitat Clusters 

% 

Classified 
within Non- 
habitat Clusters 

48 Non- 
habitat Sites 

432 

4.2 

12.6 

83 

55 Snail 
habitat Sites 

495 

3,6 

87.1 

9.2 


Table 3: Percentage of pixels classified by cluster for snail habitat and non- 

habitat sites. 
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Discussion 


Despite the fact that we limited our analysis to only those sites that were in the irrigated 
farming areas located in the river plain, there was a great deal of variability within the 
snail habitat and non-habitat sites. This was observed visually in the field as well as in 
the distributions of the spectral data. Our preliminary classifications ignored this 
variability by lumping all of the habitat sites together and all of the non-habitat sites 
together to train the classification. As a result the snail habitat class included many of the 
non-snail sites, while the non-habitat class did not classify enough of the non-snail sites. 
This poor classification may be due to the existence of multiple micro- 
environments/habitats within the irrigated farming environment which each have distinct 
spectral properties. Hence, the terms "snail habitat" and "non-habitat" encompass 
distinctly different micro-environments which support, or do not support snails, 
respectively. Therefore, when either snail habitat or non-habitat is considered as a whole, 
it appears to be quite variable. 

In the two-tiered approach, we solved the problem of multiple micro-environments by 
using the Isodata algorithm to effectively separate the highly variable habitats into 
“relatively pure”, less variable clusters before performing supervised classification. This 
was not based on field observation, but rather, the spectral data was used to create these 
clusters. The choice to create five habitat clusters, and five non-habitat clusters is not 
explained in detail, because these numbers were chosen somewhat arbitrarily. However, 
the high classification accuracy indicates that such numbers are not unreasonable. It will 
not be hard to fine-tune the number of clusters by looking at the variability and 
separability between signatures. 

In addition to refining the number of clusters, we are also working on reducing the 
number of bands used to only those that add information to the classification. Once we 
have reduced the classification down to the key bands, we hope to develop an 
understanding of what the clusters correspond to in the field. 

Future Work 
Validation Study 

In our current work we assessed the accuracy of the classification only at the locations of 
the training sites. This may have resulted in artificially high accuracies. This summer we 
plan to revisit the Arming River valley to validate our two-tiered analysis with a rigorous 
field study. We intend to obtain spring and fall Landsat TM images from a more recent 
year than 1994 to repeat the two-tiered classification. This more recent classification 
would be validated in the field. In the past, however, we have had problems obtaining 
clear images for this region. If more recent images are not available, we will perform our 
validation based on our current classification of 1994 data. 

Field sites will be chosen by randomly sampling single pixel locations within the 
classified image. The sampling will be stratified across image classes. (Recall in the 



preliminary analysis that 5 image classes each were statistically attributable to snail and 
non-snail sites.) Each sampled pixel corresponds to geographical coordinates of a site 
where field data will be collected. Balancing sample size considerations with the 
practical implications of navigating in rice fields using GPS, approximately 1 00 sites will 
be visited (50 for snail habitat and 50 for non-habitat). At each site, a snail survey will be 
conducted in the surrounding 30 m x 30 m area according to the standardized sampling 
protocol employed in Sichuan. The snail survey data will be used to assess the accuracy 
of the classification map for overall misclassification, misclassification by class, and 
misclassification by site. 

The work described thus far is useful in validating the model at the level of individual 
pixels. In many cases it is more useful to validate classifications at a much larger scale. 
Our eventual goal is to extend this work to monitoring potential snail habitat formation 
throughout the Three Gorges Dam area. This is a much larger area where policy makers 
assigning resources for control and research would require knowledge of the degree to 
which villages and townships have snail habitat. With this in mind, our second goal is to 
validate our classification at different regional levels. Since at a larger scale it is difficult 
to carry out field studies, we will rely on the knowledge from anti-endemic monitoring 
stations which routinely monitor snails with coarse surveys around villages of high 
endemicity. We propose to acquire aerial photographs of the Arming River valley. With 
the assistance of the regional anti-endemic agency, areas on the photographs will be 
mapped out which correspond to snail habitat and non-habitat. Estimates will be made on 
the amount of snail habitat within each area. These areas will be compared with 
corresponding areas in the Landsat classification map, and a similarity statistic will be 
computed. This statistic will not weigh heavily on individual pixel misclassifications, but 
will indicate whether or not the majority of the region is classified similarly by both 
methods. 

Classification Using Remote Sensing Data and Supplemental Ecological Data 

Once we have verified our classification approach we intend to study the degree to which 
additional ground data might improve the classification accuracy. According to SIPD 
(1995), the ecological correlates of O. hupensis snail habitat in Sichuan include the 
existence of certain vegetation types, size and density of irrigation ditches, proximity of 
agricultural field edges, wet lowland areas, and soil moisture, type and quality, and local 
temperature. Some of this information, such as soil type is readily available at a coarse 
scale across the Arming valley. Other kinds of information such as vegetation type and 
coverage will be collected during the randomized validation study. Local temperature 
variation is difficult to measure because of the size of the area. However, mean local 
temperature at several points throughout the valley should be available. 

The aforementioned ground data is measured on several different scales, and with varying 
reliabilities. For example soil type is a nominal variable and percent vegetation coverage, 
a bounded, interval variable. Because traditional remote sensing image analysis 
algorithms such as the maximum likelihood classifier and minimum distance classifier 



cannot be used to process nominal and ordinal data, we will analyze this additional 
ground data using several non-traditional techniques: CART (Breiman et al. 1984), logit 
regression (Chung et al. 1991), evidential reasoning (Wang et al. 1994; Gong 1996) and 
artificial neural algorithms (Gong 1996). Each of these algorithms can handle all the 
different levels of measurements and have proven useful in classification tasks where 
similar issues existed. In particular, when mapping 4 geological types in Northern 
Canada using Landsat TM data, gravity anomaly, Potassium radiometric and 
aeromagnetic data, Gong (1996) obtained high classification accuracies using evidential 
reasoning and neural networks. The data in that study had different spatial qualities, and 
all the data other than the remote sensing data are essentially point-based sample data. 
They had to be resampled through spatial interpolation in order for use with the Landsat 
TM data. When mapping 29 ecological classes in Alberta, Canada using forest species, 
crown closure and size and digital elevation data, (Gong 1996) assessed the potential of 
neural networks. Forest species data is of nominal measurement scale. The experience 
gained in these other studies will prove useful in this proposed analysis. 

Developing an ecological interpretation of the classification algorithms is central to being 
able to extrapolate the use of the algorithm to different areas. From the perspective of the 
RS information, the key will be to develop a physically based interpretation of the 
classification data for identifying the underlying ecological factors being sensed. In order 
to accomplish this, we will address several potential problems. One problem is that the 
remote sensing data come from different areas, each subject to different surface properties 
and atmospheric irradiance. Since we will select images from clear sky conditions, we 
will employ a simple atmospheric correction algorithm developed for correcting the 
molecular and aerosol effects that dominate the clear sky condition (Forster 1984; Liang 
et al. 1997). Because the snail habitat areas used in this study are located in relatively flat 
areas, the effect of illumination variation and shadowing can be safely ignored. Another 
potential problem is that the spectral signal of a pixel in each band carries spectral 
contributions from various surface cover types within approximately a pixel. We will use 
multivariate piecewise regression algorithms to investigate the relationship between 
various surface cover conditions combined with the modification by surface topography 
and the spectral values from various snail habitat and non-snail habitat sample areas. 
Statistical regression algorithms will be useful in revealing the dominant factors that 
cause the spectral differences between snail habitat and non-habitat areas. The results 
from such quantitative studies will help verify and improve our understanding of the 
ecological conditions for the habitat of different snail subspecies. Furthermore, the 
analysis results will help us in developing snail habitat indicators based primarily on 
spectral properties and the derivatives such as landscape features obtained from remotely 
sensed data. 

Other Image Sources and the Identification of Landscape Features Relating to Disease 

The work described thus far has focused on locating snail habitat. There are locations 
where snails exist, however, no disease transmission occurs. It is unclear why this is the 
case. It is clear, however that on a local scale infection intensity and disease prevalence 



are related to the relationships between people, animals, and snails, as they may be 
mediated by landscape features. These landscape features include crop type, the nature 
and density of irrigation in villages, and the proximity and density of settlements. In 
addition, topographical features such as slope and aspect determine the flow of water 
channels, which in may influence the transmission of disease. Therefore, it is of 
considerable interest to determine if topographical or landscape features that can be 
determined remotely are correlates of transmission, for such information would further 
inform remote surveillance programs for prioritizing locations within the Three Gorges 
region for intensive ground investigation. To investigate these questions, higher 
resolution images than those from Landsat TM would be necessary. 

We propose to analyze the relationships between a variety of landscape features in areas 
of known snail habitat with the level of disease prevalence using both ground and RS 
data. Snail habitat and density will be predicted using data collected as part of the 
validation fieldwork described above. We intend to obtain prevalence data at 
approximately 20 villages throughout the valley where data is available from the local 
anti-endemic authority. Landscape features, such as crop type, the nature and density of 
irrigation in villages, proximity and density of settlements, and topographic slope and 
aspect will then be analyzed to investigate the relationship of these landscape features to 
estimates of disease prevalence treated as a continuous variable. 

After determining the degree to which different landscape and topographical features 
relate to disease prevalence, we will then evaluate how remote sensing techniques can be 
used to identify these features. Without using remote sensing, landscape features such as 
the structure, location and density of human settlements, locations of roads and even 
details of irrigation are obtainable in a variety of ways. For established areas, good maps 
are usually available and these can be digitized. But for working on a regional scale, 
map-based information is labor intensive to obtain and prone to errors at several stages 
during digitization. In the new settlements within the Three Gorges area such 
information will not be available for some time. Locations of settlements can be obtained 
with a GPS in the same way that habitat sites are located, or they could be identified from 
imagery. Detailed landscape information such as the proximity of field edges or density 
of irrigation ditches is more difficult to obtain. Such data is problematic for this study 
and its inclusion will depend upon the ease of obtaining the data. 

As an example, ditch density estimates can be obtained in several ways. An estimate at a 
village scale can be obtained from the anti-endemic authority during the field studies. A 
more objective estimate could be obtained by digitizing maps from the local Irrigation 
Bureau and spatially aggregating the information to add them to the analysis. The 
average density of irrigation ditches of various widths could be calculated at spatial 
aggregates comparable to the size of the 3 x 3 pixel (90 m x 90 m) aggregates used in the 
original classification described above. Because of the labor involved in working with 
detailed maps on this scale, this is the least attractive method. 



Presently there are two types of satellite imagery that can be used for obtaining 
topographical features: 10 m resolution SPOT HRV-PAN imagery and 6.25 m IRS- ID 
imagery (resampled to 5 m). With these high resolution satellite data, we can obtain 
topographic features through automatic processing of stereo pairs of these images 
(images taken from different view angles of the same region). This is accomplished by 
digital photogrammetry (Saleh et al. 1994). Digital photogrammetry is a promising tool 
for mapping plain and valley bottom areas that are of interest to this study. Such 
capability has been developed rapidly for 1 0-20 years and now is mature enough to 
extract digital surface elevation from aerial photographs and satellite imagery. With 
digital photogrammetry applied to high resolution stereo-imagery, one can extract both 
the horizontal and vertical coordinates for any point in either stereo image producing a 
highly accurate three-dimensional digital surface model (DSM). Our previous experience 
with monitoring the change of hardwood rangeland in California indicates that better than 
2-3 m accuracies in both the horizontal and vertical directions are possible with 1 m 
resolution imagery (Lee 1997; Mostafa et al. 1997). From a DSM landscape features like 
slope, aspect , and concaveness or convexity of areas are easily extracted. At Berkeley, 
we have expertise in digital photogrammetry and the necessary professional software 
packages for processing this data (e.g., virtuoZo from Jetway Inc., and OrthEngine from 
PCI Inc.). 

In the next couple of years, we will have available, not only improved spectral imaging 
capabilities such as the MODIS with 36 spectral bands, but also improved spatial 
resolution from commercial satellites, such as the approximate 1 m resolution capabilities 
of Space Imaging and Earth Watch. Both improved spectral imaging and higher 
resolution data will allow us to better identify landscape features. With 1-5 m resolution 
satellite data, ditches of varying widths and types, edges of agricultural fields, coverage 
of vegetation, surface roughness, land-cover and land-use patterns and the exact spread of 
villages can all be extracted with improved accuracy. In particular, we will explore the 
use of linear feature extraction algorithms and the gradient profile modeling algorithm 
(Wang 1993). The gradient profile modeling algorithm has been used successfully in 
drainage and road network extraction (Gong et al. 1997) with remote sensing imagery of 
different spatial resolution varying from 1.6 m to 30 m. 
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